SAFARI: shape analysis for AI-segmented images

Background Recent developments to segment and characterize the regions of interest (ROI) within medical images have led to promising shape analysis studies. However, the procedures to analyze the ROI are arbitrary and vary by study. A tool to translate the ROI to analyzable shape representations and features is greatly needed. Results We developed SAFARI (shape analysis for AI-segmented images), an open-source R package with a user-friendly online tool kit for ROI labelling and shape feature extraction of segmented maps, provided by AI-algorithms or manual segmentation. We demonstrated that half of the shape features extracted by SAFARI were significantly associated with survival outcomes in a case study on 143 consecutive patients with stage I–IV lung cancer and another case study on 61 glioblastoma patients. Conclusions SAFARI is an efficient and easy-to-use toolkit for segmenting and analyzing ROI in medical images. It can be downloaded from the comprehensive R archive network (CRAN) and accessed at https://lce.biohpc.swmed.edu/safari/. Supplementary Information The online version contains supplementary material available at 10.1186/s12880-022-00849-8.


S1. Supplement to Shape Representations.
S1.1. Binary Matrix. Let M W ×L be a binary matrix representing an arbitrary Wby-L image, containing a 4-connected region, where the foreground and background are composed of ones and zeros, respectively. Additionally, we can represent each pixel in the image as a point in a 2-dimensional discrete plane, that is, each entry M wl ∈ M can be denoted as a point (l, w) ∈ N 2 . Furthermore, to differentiate between foreground and background points, let I R : N 2 → {0, 1} be the indicator function for an image matrix given by I R (l, w) = 1 if (l, w) is a foreground pixel, 0 if (l, w) is a background pixel.
The indicator function I R and distribution of points (l, w)'s will be used to recreate the region's contour in a two dimensional Cartesian plane, known as the polygonal chain. S1.2. Polygonal Chain. The entries of the binary matrix M W ×L that make up the contour of the region can be extracted by the Moore-Neighbor tracing algorithm, modified by Jacob's stopping criteria, with the 1) starting boundary point, 2) direction to traverse the boundary (clockwise or counter-clockwise), and 3) pixel connectivity (Gonzalez, Woods and Eddins, 2020). EBImage, and as a result also SAFARI, uses a 4-connectivity. Therefore, the first argument is trivial. For the starting boundary point, the point in the lowest left-most location is chosen. Specifically, let be the collection of points that make up the region and are located in the lowest y-coordinate such that (l 1 , w 1 ) =    S 1 |S| = 1 min l1 S otherwise Applying the Moore-Neighbor tracing algorithm to the region matrix, results in the points that make up the boundary of the region, specifically, where the boundary begins at the lowest left-most area of the region and traverse through the boundary in a clockwise direction. From the boundary points, we can create a sequence of points, known as the closed polygonal chain, that represents the boundary of the region by creating a closed and simple polygon. We can see the binary matrix and its corresponding polygonal chain in Figure S2. Let N = l,w I R (l, w) be the total number of points that make up the region and P (n+1)×2 be the collection of points that make up the closed polygonal chain of the region such that 1) n ≤ N is the number of boundary points, 2) P i· = (x i , y i ) ∈ N 2 , for i = 1, . . . , n + 1, and 3) (x 1 , y 1 ) = (x n+1 , y n+1 ). Through the polygonal chain, we can derive further two-and one-dimensional shape representations that can be used to compute specific descriptors. S1.3. Chain Code. The slope of a shape's contour can be approximated by the directional changes between two consecutive boundary points. These directional changes can be encoded to, essentially, assign a number (from 0 to 7) to each possible relative direction resulting in an encoding list, each element known as the chain code, that provides a compact representation of the shape's contour (Wirth, 2004;Agu, 2014). Let c be a 1 × n vector representing the chain codes of the polygonal chain where each entry c i ∈ N ∩ [0, 7] corresponds to a direction in the 8-way split of the unit circle and determined by a series of steps.
First, we determine the angle between the vector composed of the difference between the two consecutive points and the x-axis, that is, let θ i = arctan2(d i ) be the resulting angle where d i = P i· − P i+1· is the difference. Since θ i ∈ [−π, π], we have to transform the angle toθ such thatθ i ∈ [0, 2π). As a result, we can now determine the corresponding chain code Clearly, we can see that this procedure splits the unit circle into eight equal parts. Additionally, if the directional change does not exactly align within the eight splits, then the rounding operator will approximate the chain code to the nearest integer. S1.4. Curvature Chain Code. Let ∆c be a 1 × n vector representing the curvature chain code such that each entry is formed from a transformation of the difference between two consecutive chain codes, that is, let ∆d i = c i − c i+1 and This simple chain code derivation estimates the curvature and contains information on the convexity of the shape (Wirth, 2004). S1.5. Radial Lengths. Let r be a 1 × n vector of radial lengths, that is, each entry r i = P i· − P c , for i = 1, . . . , n, is the Euclidean distance from the boundary point to the shape's centroid. We define the centroid of the polygonal chain P c = (x c , y c ) as is the signed area of the shape, obtained using Gauss's area formula. Clearly, the radial lengths are not scale-invariant (as the Euclidean distance is not). Therefore, to properly analyze the structure of r, the individual radial lengths must be normalized. S1.6. Normalized Radial Lengths. Let r (n) be the maximum radial length in r such that we can introduce a 1 × n vector of normalized radial lengths, denoted asr, where each entryr i = r i / r (n) , i = 1, . . . , n. By normalizing the radial lengths, we have obtain a 1-dimensional signal that is scale-invariant and which we can use to analyze the fine details of the shape's contour (Wirth, 2004).

Table S1
Components of the output list resulting from the SAFARI procedure.

Component Description desc
A data.frame object of the shape features corresponding to each segmented ROI.
holes An integer matrix containing the holes within each ROI, labeled according to the regions.
id A character vector that is identical to the id argument.
k A specified factor to enlarge the polygonal chain by with the default being 3.
n Number of resulting segmented regions.
plg.chains A list object where each component is the polygonal chain of a segmented ROI.
regions An integer matrix containing the segmented ROI, labeled from largest to smallest.
*It takes about 0.3 seconds for SAFARI to run a moderate size binary image (300 × 300 pixels).

Table S2
Data notation for shape representations.

Name
Data Support  N Number of erosion steps that can be applied to the region before the area equals zero.

R+
Relationship between the area of the region and the square of its thickness.
R+ Approximate length of the boundary that makes up the region.
Wirth (2004) Fibre Width Fibrewidth(P ) = Areafilled(P ) / Fibrelength(P ) Wirth (2004) PCH Convex Area Measurement can be obtained by applying the same formula as in Filled Area to PCH.
Convex Perimeter Measurement can be obtained by applying the same formula as in Perimeter to PCH.
Roundness 6 Roundness(PCH) = 4π · Areafilled(P ) / Perimeter(PCH) 2 [0, 1] Relationship between the area of the region and the area of a circle with the same convex perimeter i.e. area-to-perimeter ratio.    pk log(pk) pk = P (ω <r < ω + ∆) R+ Probabilistic measure of how well the radial lengths can be estimated.   2 The holes within the region must be filled before computing the measurement.

3
This measurement applies to curved regions, unlike eccentricity.

5
Approximation worsens as the true size of the region increases.

6
This measurement excludes local irregularities which makes it relatively insensitive to irregular boundaries.

7
This measurement accounts for irregularities, such as an irregular boundary.

8
A measure less than one accounts for a region with an irregular boundary or containing holes.

9
As the measurement decreases, the degree to which the region is "curled up" increases.

10
The minimum value is obtained for a convex shape.

11
Relies on the 100-bin histogram of the radial lengths where pk is the k th entry and ω = k / 100, ∆ = 0.01.

Table S6
Comparison of the results obtained in our univariate study, compared to those in Wang et al. (2018). The p-value in Wang et al. (2018) corresponds to either the sum of the shape feature for all regions or the shape feature for the main region, whichever was more significant.

EBImage
This package provides general purpose functionality for image processing and analysis, by offering tools to segment cells and extract quantitative cellular descriptors and automating image processing tasks.

SAFARI
This package provides functionality for image processing and shape analysis in the context of segmented medical images generated by deep learning-based methods or standard image processing algorithms and produced from different medical imaging types. Specifically, offers tools to segment regions of interest and extract quantitative shape descriptors.